String Discovery for Private Analytics
نویسندگان
چکیده
A number of research systems enable analysts to aggregate user data that is distributed across user devices while preventing online tracking and providing users with differential privacy guarantees. These systems rely on pre-defined string values to release relevant user data in a controlled fashion. Unfortunately, many string values (e.g., tags in a photo application) may not be easily predicted. Existing private aggregation systems that can be used to discover strings for private analytics purposes exhibit serious shortcomings, such as heavy client-side operations and an inability to deal with malicious clients supplying incorrect data. In this paper, we present a practical and privacy-preserving string discovery system that provides analysts with previously unknown strings and limits the effects of malicious clients, while supporting a variety of user devices with varying computation and bandwidth resources. To achieve these goals, our system employs the exclusive-OR (XOR) operation as its crypto primitive, and utilizes a novel method to determine the equivalence of two XOR-encrypted strings without revealing them. We present our design, analyze its privacy properties and evaluate its feasibility. Our results show that our system outperforms the closest system by several orders of magnitude for client-side computations and one order of magnitude for server-side computations.
منابع مشابه
Private Record Linkage: Comparison of Selected Techniques for Name Matching
Grzebala, Pawel. M.S.C.E. Department of Computer Science and Engineering, Wright State University, 2016. Private Record Linkage: A Comparison of Selected Techniques for Name Matching. The rise of Big Data Analytics has shown the utility of analyzing all aspects of a problem by bringing together disparate data sets. Efficient and accurate private record linkage algorithms are necessary to achiev...
متن کاملwww.simularity.com The Simularity High Performance Correlation Engine
Why similarity analytics? Similarity analytics are the best analysis tool for discovery of insights from big data. The value is in getting the data to tell you things you didn't know. This is a challenge best solved by looking for connections in the data. You just can't do this discovery with the standard analytics that come with a data warehouse. And doing this type of discovery over large dat...
متن کاملBig Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions
The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...
متن کاملLaw in the Age of Exabytes: Some further Thoughts on ‘Information Inflation’ and Current Issues in E-Discovery Search
* Director of Litigation, National Archives and Records Administration, Washington, D.C.; Co-Chair, The Sedona Conference Working Group on Electronic Document Retention and Production; Adjunct Professor, University of Maryland. B.A. magna cum laude, Wesleyan University (1977), J.D., Boston University School of Law (1980). This article represents a reworking and expansion of a presentation at th...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کامل